Take the various file (EIA facilities, EIA state-level totals, EPA emissions, etc) and combine them to determine total generation, CO₂ emissions, etc nationally and at NERC regions.
Make sure the file_date
parameter below is set to whatever value you would like appended to file names.
In Calculate the extra gen/fuel consumption at the state levels add years after 2017 in the for
loop if needed.
In Fraction of estimated gen/fuels in each NERC region update the years that plant ids for annual plants are fetched from EIA-923 files.
The entire notebook can be run at once using Run All Cells
In [1]:
import pandas as pd
import numpy as np
import os
from os.path import join
import sys
import json
idx = pd.IndexSlice
cwd = os.getcwd()
data_path = join(cwd, '..', 'Data storage')
In [2]:
file_date = '2018-03-06'
In [3]:
%load_ext watermark
%watermark -iv -v
In [4]:
# Load the "autoreload" extension
%load_ext autoreload
# always reload modules marked with "%aimport"
%autoreload 1
In [5]:
# add the 'src' directory as one where we can import modules
src_dir = join(os.getcwd(), os.pardir, 'src')
sys.path.append(src_dir)
In [6]:
%aimport Data.make_data
from Data.make_data import states_in_nerc
%aimport Analysis.index
from Analysis.index import facility_emission_gen, group_facility_data, add_quarter
%aimport Analysis.index
from Analysis.index import g2lb, change_since_2005, generation_index
%aimport Analysis.index
from Analysis.index import facility_co2, adjust_epa_emissions, group_fuel_cats
%aimport util.utils
from util.utils import rename_cols, add_facility_location
In [7]:
cwd = os.getcwd()
path = join(data_path, 'Derived data',
'Facility gen fuels and CO2 {}.csv'.format(file_date))
eia_fac = pd.read_csv(path)
In [8]:
rename_cols(eia_fac)
In [9]:
path = join(data_path, 'Derived data',
'Monthly EPA emissions {}.csv'.format(file_date))
epa = pd.read_csv(path)
In [10]:
fuel_cat_folder = join(data_path, 'Fuel categories')
state_cats_path = join(fuel_cat_folder, 'State_facility.json')
with open(state_cats_path, 'r') as f:
state_fuel_cat = json.load(f)
custom_cats_path = join(fuel_cat_folder, 'Custom_results.json')
with open(custom_cats_path, 'r') as f:
custom_fuel_cat = json.load(f)
In [11]:
co2, gen_fuels_state = facility_emission_gen(eia_facility=eia_fac, epa=epa,
state_fuel_cat=state_fuel_cat,
custom_fuel_cat=custom_fuel_cat,
export_state_cats=True)
In [12]:
%aimport Analysis.index
from Analysis.index import extra_emissions_gen
Total EIA generation/fuel consumption and emission factors
In [13]:
cwd = os.getcwd()
path = join(data_path, 'Derived data',
'EIA country-wide gen fuel CO2 {}.csv'.format(file_date))
eia_total = pd.read_csv(path)
path = join(data_path,
'Final emission factors.csv')
ef = pd.read_csv(path, index_col=0)
In [14]:
extra_co2, extra_gen_fuel = extra_emissions_gen(gen_fuels_state, eia_total, ef)
In [15]:
facility_co2 = co2.groupby(['year', 'month']).sum()
In [16]:
national_co2 = (facility_co2.loc[:, 'final co2 (kg)']
+ extra_co2.loc[:, 'elec fuel co2 (kg)']
.groupby(['year', 'month']).sum())
national_co2.name = 'final co2 (kg)'
In [17]:
gen_fuels_state.groupby(['type', 'year', 'month']).sum().head()
Out[17]:
In [18]:
national_gen = (gen_fuels_state
.groupby(['type', 'year', 'month'])['generation (mwh)'].sum()
.add(extra_gen_fuel['generation (mwh)'], fill_value=0))
In [19]:
national_gen = group_fuel_cats(national_gen.reset_index(), custom_fuel_cat,
'type', 'fuel category').set_index(['fuel category', 'year', 'month'])
In [20]:
total_gen = national_gen.groupby(['year', 'month']).sum()
In [21]:
national_index = total_gen.copy()
national_index['final co2 (kg)'] = national_co2
national_index['index (g/kwh)'] = (national_index['final co2 (kg)']
/ national_index['generation (mwh)'])
national_index.reset_index(inplace=True)
add_quarter(national_index)
g2lb(national_index)
change_since_2005(national_index)
In [22]:
path = join(data_path, 'National data',
'National index {}.csv'.format(file_date))
national_index.to_csv(path, index=False)
In [23]:
df_list = []
for fuel in national_gen.index.get_level_values('fuel category').unique():
percent_gen = national_gen.loc[fuel].divide(total_gen, fill_value=0)
percent_gen['fuel category'] = fuel
percent_gen.set_index('fuel category', inplace=True, append=True)
df_list.append(percent_gen)
percent_gen = pd.concat(df_list)
In [24]:
path = join(data_path, 'National data',
'National generation {}.csv'.format(file_date))
national_gen.to_csv(path)
path = join(data_path, 'National data',
'National percent gen {}.csv'.format(file_date))
percent_gen.to_csv(path)
Create a .csv with lat/lon and state code for each facility. Need to manually add the NERC region label. I've done this with a spatial join between the lat/lon and NERC shapefiles in QGIS. It should be possible to do the spatial join in GeoPandas. Could also add custom region labels (e.g. eGRID subregions, ISO/RTO boundaries, etc).
This isn't possible with a spatial join because NERC regions aren't strictly based on geography. Instead, I've taken the NERC codes from EIA-860 and assigned unknown plants (mostly those that retired before 2012 when modern NERCs were mostly defined, and those that were assigned a plant id in 2017) NERC labels using a k-nearest neighbors algorithm and lat/lon information.
See the Assign NERC region labels
notebook for the creation of this file.
In [25]:
%aimport Data.make_data
from Data.make_data import get_annual_plants
In [26]:
annual_ids_2015 = get_annual_plants(2015)
In [27]:
annual_ids_2017 = get_annual_plants(2017)
Combine the lists of annual plants in 2015 and 2017. This lets us catch facilities that have gone from monthly to annual since 2015, but it also includes plants that were annual in 2015 an may have retired.
There is the possibility of an error in allocation to NERC regions for 2016 state-level generation when using the facilities that changed to annual in 2017. But since the state-level generation is so much smaller in 2016 I don't think this is much of an issue.
In [28]:
annual_ids = set(annual_ids_2015.tolist() + annual_ids_2017.tolist())
In [57]:
len(annual_ids)
Out[57]:
In [58]:
%aimport Analysis.state2nerc
from Analysis.state2nerc import fraction_state2nerc, add_region
In [59]:
cwd = os.getcwd()
path = join(data_path, 'Facility labels',
'Facility locations_RF.csv')
location_labels = pd.read_csv(path)
In [60]:
nerc_state_path = join(data_path, 'Derived data',
'NERC_states.json')
with open(nerc_state_path, 'r') as f:
nerc_states = json.load(f)
Added the filter that year must be 2015 - was getting all 2015 annual plants, but for all years!
In [61]:
eia_2015_annual = eia_fac.loc[(eia_fac['plant id'].isin(annual_ids)) &
(eia_fac['year'] == 2015)].copy()
# Group to state-level fuel categories
eia_2015_annual = group_fuel_cats(eia_2015_annual, state_fuel_cat)
In [62]:
eia_2015_annual_nerc = add_facility_location(eia_2015_annual, location_labels,
labels=['state', 'nerc', 'year'])
In [63]:
eia_2015_annual_nerc.head()
Out[63]:
This is 2015 data on annual reporting facilities (from both 2015 and 2017)
In [64]:
# Get a list of all state abbreviations
all_states = []
for value in nerc_states.values():
all_states.extend(value)
all_states = set(all_states)
In [65]:
df_list = []
for state in all_states:
try:
df_list.append(fraction_state2nerc(eia_2015_annual_nerc,
state, region_col='nerc', fuel_col='type'))
except:
print(state)
pass
In [66]:
nerc_fraction = pd.concat(df_list)
nerc_fraction.set_index(['state', 'nerc', 'type'], inplace=True)
nerc_fraction.sort_index(inplace=True)
With the values below I can allocate extra state-level generation and fuel use to each of the NERC regions!
Making sure that no values are greater than 1 (within tolerance)
In [67]:
(nerc_fraction.groupby(['state', 'type']).sum() > 1.0001).any()
Out[67]:
In [68]:
idx = pd.IndexSlice
In [69]:
# a dictionary to match column names
nerc_frac_match = {'% generation': 'generation (mwh)',
'% total fuel': 'total fuel (mmbtu)',
'% elec fuel': 'elec fuel (mmbtu)'}
In [70]:
path = join(data_path, 'Derived data',
'EIA state-level gen fuel CO2 {}.csv'.format(file_date))
state_total = pd.read_csv(path, parse_dates=['datetime'])
In [71]:
rename_cols(state_total)
state_total['state'] = state_total['geography'].str[-2:]
Simplify the dataframe
In [72]:
cols = list(nerc_frac_match.values())
state_total = state_total.groupby(['state', 'year', 'month', 'type'])[cols].sum()
In [73]:
# list of NERC regions
nercs = nerc_fraction.index.get_level_values('nerc').unique()
In [74]:
cols = list(nerc_frac_match.values())
eia_fac_type = group_fuel_cats(eia_fac, state_fuel_cat)
eia_fac_type = add_facility_location(eia_fac_type, location_labels, ['state', 'year'])
eia_fac_type = eia_fac_type.groupby(['state', 'year', 'month', 'type'])[cols].sum()
In [75]:
state_extra = (state_total.loc[idx[:, 2016:, :, :], :]
- eia_fac_type.loc[idx[:, 2016:, :, :], :])
state_extra.dropna(how='all', inplace=True)
state_extra = state_extra.reorder_levels(['year', 'state', 'month', 'type'])
state_extra.sort_index(inplace=True)
Sort the index of each dataframe to make sure they can be easily combined.
In [76]:
nerc_fraction.sort_index(inplace=True)
In [77]:
state_extra.sort_index(inplace=True)
Create a copy of the nerc_fraction
dataframe with repeated values for every month of the year, so that they MultiIndex matches the state_extra
MultiIndex
In [78]:
df_list = []
for month in range(1, 13):
df = nerc_fraction.copy()
df['month'] = month
df.set_index('month', append=True, inplace=True)
df_list.append(df)
nerc_frac_monthly = pd.concat(df_list, axis=0)
nerc_frac_monthly.sort_index(inplace=True)
nerc_frac_monthly = (nerc_frac_monthly
.reorder_levels(['nerc', 'state', 'month', 'type']))
Cycle through each year (2016 and 2017 in this case) and each NERC, multiplying the state-level extra generation, total fuel consumption, and fuel consumption for electricity by the share that should be allocated to each NERC.
In [79]:
df_list_outer = []
for year in [2016, 2017]:
df_list_inner = []
for nerc in nercs:
df = pd.concat([(nerc_frac_monthly
.loc[nerc]['% generation']
* state_extra
.loc[year]['generation (mwh)']).dropna(),
(nerc_frac_monthly.
loc[nerc]['% total fuel']
* state_extra
.loc[year]['total fuel (mmbtu)']).dropna(),
(nerc_frac_monthly
.loc[nerc]['% elec fuel']
* state_extra
.loc[year]['elec fuel (mmbtu)']).dropna()],
axis=1)
df.columns = nerc_frac_match.values()
df['nerc'] = nerc
df['year'] = year
df = df.groupby(['year', 'nerc', 'month', 'type']).sum()
df_list_inner.append(df)
df_list_outer.append(pd.concat(df_list_inner))
final = pd.concat(df_list_outer)
final.sort_index(inplace=True)
In [80]:
nerc_frac_monthly.sort_index(inplace=True)
Although the dataframe is called final
, it's really just the final allocated extra state-level generation/fuel consumption
In [81]:
path = join(data_path, 'Derived data',
'NERC extra gen fuels {}.csv'.format(file_date))
final.to_csv(path)
In [ ]: